#### NATIONAL UNIVERSITY OF SINGAPORE

### CS2100 – COMPUTER ORGANISATION

(Semester 2: AY2021/22)

#### **ANSWERS**

Time Allowed: 2 Hours

### **INSTRUCTIONS TO STUDENTS**

- 1. This assessment paper consists of **SEVENTEEN (17)** questions in **THREE (3)** parts and comprises of **THIRTEEN (13)** printed pages.
- 2. Answer ALL guestions on the **ANSWER SHEETS**.
- 3. This is an **OPEN BOOK** assessment.
- 4. Write your answers only on the **ANSWER SHEETS**. You may write in pen or pencil. You are to write within the space provided. No extra pages should be submitted.
- 5. Printed/written materials are allowed. Apart from calculators, electronic devices are not allowed.
- 6. Page 13 contains the MIPS Data Reference sheet.
- 7. The maximum mark of this assessment is 100.

| Question        | Max. mark |  |  |
|-----------------|-----------|--|--|
| Part A: Q1 – 6  | 12        |  |  |
| Part B: Q7 – 12 | 18        |  |  |
| Part C: Q13     | 12        |  |  |
| Part C: Q14     | 16        |  |  |
| Part C: Q15     | 13        |  |  |
| Part C: Q16     | 13        |  |  |
| Part C: Q17     | 16        |  |  |
| Total           | 100       |  |  |

—— END OF INSTRUCTIONS ———

#### Part A: Multiple-Choice Questions [Total: 6×2=12 marks]

Each multiple-choice question (MCQ) is worth **TWO marks** and has exactly **one** correct answer. Please write your answers in **CAPITAL LETTERS**.

1. What is (20.22)<sub>4</sub> in decimal?

```
A. 8.75

B. 8.625

C. 8.5

D. 8.25

(20.22)<sub>4</sub> = 2 \times 4^{1} + 2 \times 4^{-1} + 2 \times 4^{-2} = 8 + 0.5 + 0.125 = 8.625
```

E. None of the above.

2. Consider the following IEEE 754 single-precision floating point number written in hexadecimal: **0x42022000**. What is the number in decimal?

```
A. 130.125

B. 65.0625

C. 32.53125

D. 16.265625

E. None of the above

Standard IEEE754 conversion.

0100 0010 0000 0010 0000 0000 0000

Exponent = 132 - 127 = 5.

Mantissa = 1.0000010001

Value = 1.0000010001×2<sup>5</sup> = 100000.10001 = 32.53125
```

3. Given that the first print (i.e., **printf #1**) is "123 321" (without the quotes) and the second print (i.e., **printf #2**) is "123 654" (without the quotes). What is the correct definition of data assuming that data t data variable is correctly declared?

```
#include <stdio.h>
typedef struct { /* blank for question */ } data t;
void foo(data t);
int main() {
  // data t data is declared correctly here
  printf("%d %d\n", data.x[0], data.y[0]); // printf #1
  foo(data);
  printf("%d %d\n", data.x[0], data.y[0]); // printf #2
  return 0;
}
void foo(data t data) {
                                data.x is unchanged, data.y is changed (array vs
  data.x[0] = 456;
                                pointer inside a struct)
  data.y[0] = 654;
}
```

```
A. typedef struct { int x[1]; int y[1]; } data_t;
B. typedef struct { int *x ; int y[1]; } data_t;
C. typedef struct { int *x ; int *y ; } data_t;
D. typedef struct { int x[1]; int *y ; } data_t;
E. None of the above.
```

- 4. If we were to add "NAND \$rd, \$rs, \$rt" operation into our MIPS instructions, given our processor implementation, what will be the ALUControl signal needed?
  - A. 1101
  - B. 1110

Negative-OR

- C. 1100
- D. 1111
- E. None of the above.
- 5. Given the Boolean function  $F(A,B,C,D) = \sum m(1,3,6,8,9,11,15) + \sum x(7,14)$  where x denotes don't-care, which of the following is <u>not</u> an EPI in the K-map of F?
  - A. C·D
  - B. *B*⋅*C*
  - C. B'·D
  - D.  $A \cdot B' \cdot C'$
  - E. None of the above.



- 6. You are given a single 2×4 decoder with 1-enable and active high output, and no other devices or logic gates. Which of the following expressions <u>cannot</u> be implemented with this single decoder, where A, B and C are Boolean variables?
  - A.  $A \cdot B \cdot C$
  - B.  $A' \cdot B \cdot C$
  - C. *A⋅B'⋅C*
  - D. A'⋅B'⋅C'
  - E. None of the above.

You need at least one of the literals to be uncomplemented in order to use a single 2×4 decoder to implement it.

#### Part B: Multiple-Response Questions [Total: 6×3=18 marks]

Each multiple-response question (MRQ) is worth **THREE marks** and may have one answer or multiple answers. Write out **all** correct answers. For example, if you think that A, B, C are the correct answers, write A, B, C. Please write in **CAPITAL LETTERS**.

Only if you get all the answers correct will you be awarded three marks. **No partial credit will be given for partially correct answers.** 

- 7. Which of the following is **equivalent** to the length of the string s as returned by strlen(s)?
  - A. The index of the first null character in the string s.
  - B. The number specified during the definition of variable s (e.g., 10 in char s[10]).
  - C. The amount of memory allocated to the string s.
  - D. The difference of the address of the first null character in the string s and value of string s.
  - E. Twice the average of the address of the first null character in the string s and the value of string s.

#### Answer: A, D.

Working:

D is correct because if the index is n, then &s[n] - s = &s[n] - &s[0] = n - 0 = n.

8. Aiken is running the following MIPS program. What are the possible number of instructions executed **in total** assuming that the program does not result in an error, if you can start with any starting value of \$s1 and \$s2 and with any possible memory content?

```
addi $t0, $s1
      addi $t1, $zero, 1
      addi $t2, $s2
           $t3, 0($s2)
loop: lb
      andi $t4, $t3
                     , 0x1
      beq $t4, $zero, else
      sll
           $t1, $t1
                    , 1
else: addi $t0, $t0
      addi $s2, $s2
                     , 1
      bne
           $t3, $zero, loop
```

- A. 9
- B. 14
- C. 15
- D. 43
- E. 63

### Answer: A, C, D, E

Working:

Outside of the loop, there are 3 instructions. Inside the loop, it is either 7 or 6 instructions.

9 is possible because we have 3+6

14 is impossible

15 is possible because we have 3+6+6

43 and 63 are possible with many different possibilities

- 9. Consider the instruction "lb \$rt, imm(\$rs)" in general. What is true about this instruction?
  - A. The value specified by **imm** must be a multiple of 4.
  - B. The value in the register specified by **\$rs** must be a multiple of 4.
  - C. The sum of **imm** and the value in the register specified by **\$rs** must be a multiple of 4.
  - D. The sum of **imm** and the value in the register specified by **\$rs** can be any value.
  - E. The value in the register specified by **\$rs** can be any bit pattern.

#### Answer: E.

LB need not be have multiple of 4 but also cannot just be any value because there is no negative address and imm can be negative!

10. A "no-operation" typically written as **nop** is an instruction that tells the processor to do nothing. However, we can simulate this using other instructions. Instead of telling the processor to do nothing, what we want is simply to have the processor produce "no effect". In other words, no registers or memory addresses can have their value changed (i.e., if the value before the operation is n, then the value after the operation is also n).

Which of the following operations are valid and produce no effect? You may assume that any label is valid.

- A. bne **\$0, \$0, label**
- B. addi \$0, \$0, 0
- C. s11 \$2, \$2, 0
- D. add \$0, \$0, \$0
- E. srl \$4, \$4, 0

#### Answer: A, C, E.

- A. Zero is always equal to zero, so the branch is never true.
- B,D. Cannot write to \$zero.
- C,E. Shift by 0 bit is doing nothing.
- 11. Which of the following statements are true about the control signal in our processor implementation and our supported MIPS instructions in Lecture 12?
  - A. If RegWrite = 0, it is actually *possible* to have a different implementation where there are 3 other control signals having the value of "don't care" for at least 1 instruction.
  - B. It is possible to have the value of MemRead equals to MemWrite in some instructions.
  - C. If the value of MemRead is not equal to MemWrite, then the value of ALUSrc is always 1.
  - D. There is an instruction where Branch is the only control signal that is equal to 1.
  - E. sw instruction may have the value of MemRead changed to X since the result is not going to be written into a register.

### Answer: A, B, C.

- A. beq can have RegDst, MemToReg and MemRead = X (but not sw).
- B. MemRead = MemWrite = 0.
- C. Both lw and sw satisfy this.
- D. beq <u>almost</u> satisfies this but ALUop0 is unfortunately not 0.
- E. Cannot, because X implies can be both 0 or 1, in this case it cannot be 1.

- 12. Consider a machine with word size of 4 bytes and 16-bit fixed-length instructions with three types of instructions as shown below. Note that although Type A and Type B have the same number of bits for opcode, they are considered different instruction types because they have different kinds of operands.
  - Type A: 4-bit opcode, 1 address and 1 register;
  - Type B: 4-bit opcode and 1 immediate;
  - **Type C**: 6-bit opcode and 2 registers.

Assume all bits are utilised and all instructions types exists, each <u>address</u> holds a **word** instead of a byte, the immediate field is in 2s complement, all addresses are used and all registers can be encoded. Which of the following statements are true about the instruction?

- A. The maximum number of Type A instructions is 15.
- B. The maximum number of Type C instructions is 56.
- C. There are a total of 512 bytes of memory.
- D. The maximum value of the immediate field is 2047.
- E. The maximum number of registers is 16.

#### Answer: B, C, D.

Type C has 6-bit opcode and 2 registers  $\rightarrow$  each register is 5 bits.

Type A has 4-bit opcode, 1 address and 1 register  $\rightarrow$  each address is 7 bits.

A. 16 - 2 because type B and type C must exist. So only 14 and not 15.

B. 14 for the 4-bit opcodes \* 4 for the extended 2-bit opcodes for a total of 14\*4 = 56.

C.  $2^7$  address = 128 address but each address holds a word so a total of 128\*4 = 512 bytes.

D. 12 bits for immediate, in 2s complement, so  $2^{11} - 1 = 2047$ .

E. 5 bits for register, so should be 32 registers and not 16.

### Part C: There are 5 questions in this part [Total: 70 marks]

#### Q13. Sequential circuits [12 marks]

(a) A sequential circuit with 6 states: state 1 (ABC=001 $_2$ ) through state 6 (ABC=110 $_2$ ) is implemented using three JK flip-flops as shown below. Complete the state diagram on the Answer Sheets. One of the transitions on the state diagram has been drawn for you. [5 marks]



- (b) Is the circuit self-correcting? Explain. (Mark will not be awarded if no explanation is given or the explanation given is incomplete/incorrect.) [1 mark]
- (c) Redesign the circuit using only **T flip-flops**. You do not have to follow where the unused states transit to in the given circuit above. That is, you only need to make sure that the transitions from the used states follow the above circuit. You do not need to draw your new design. Write out the flip-flop input functions *TA*, *TB* and *TC* so that your new design can be implemented with the fewest number of logic gates other than the flip-flops. [6 marks]

#### Q14. Combinational circuits [16 marks]

Note that logical constants 0 and 1 are available, but not complemented literals.

(a) Given the following circuit which comprises a  $2\times4$  decoder with 1-enable and active high outputs and a 4:1 multiplexer, write out the sum-of-minterms expression of F(A,B,C,D) in the  $\Sigma m$  notation. [4 marks]



(b) The circuit below comprises a 2-bit parallel adder and 2 XOR gates. The circuit is housed inside a chip so the only connections available are those that extend out of the dotted box.

Using this circuit and one additional logic gate (inverter, AND, OR, NAND, NOR, XOR, or XNOR), implement the function *F* in part (a) above. [4 marks]



(c) [Total: 8 marks]

Given this Boolean function:  $G(A,B,C,D) = \Sigma m(0, 4, 6, 8, 9, 10, 12, 13, 14, 15)$ ,

(i) How many PIs and EPIs are there in the K-map of G?

[2 marks]

(ii) Write the simplified SOP expression for G.

[2 marks]

(iii) Implement *G* using at most two 4-bit magnitude comparators and a 2-input OR gate. The block diagram of a 4-bit magnitude comparator is shown below. [4 marks]



Page **8** of **21** 

#### Q15. MIPS [13 marks]

Study the following MIPS code on integer arrays A and B, with array B containing twice as many elements as array A. The following are the variable mappings:

- \$a0 = size (number of elements in array A)
- \$a1 = base address of array A
- \$a2 = base address of array B

```
.data
size: .word 5
A: .word 1, 2, 3, 4, 5
B: .word 5, 3, 4, 5, 3, 8, 2, 5, 9, 4
.text
main: la
           $t0, size
                         # $t0 is the address of size
           $a0, 0($t0)
                          # $a0 is the content of size
      lw
      la
           $a1, A
                         # $a1 is the base address of array A
                    # $a2 is the base address of array B
      la
           $a2, B
                      add $t0, $0, $0  # I1; i = 0
addi $t1, $a1, 0  # I2; $t1 = &A[0]
addi $t2, $a2, 0  # I3; $t2 = &B[0]
      sll
           $t3, $a0, 2
                          # I4
loop: slt
           $t4, $t0, $t3 # I5
      beq $t4, $0, end
                          # 16
      lw
           $s1, 0($t1)
                          # I7
      lw
           $s2, 0($t2)
                          # 18
      slt $t4, $s1, $s2
                          # 19
      beq $t4, $0, skip
                         # I10
                          # I11
           $t9, $s1, $0
      add
      add
           $s1, $s2, $0
                          # I12
      add $s2, $t9, $0
                          # I13
skip: sw
           $s1, 0($t1)
                          # I14
                          # I15
           $s2, 0($t2)
      addi $t0, $t0, 4
                          # I16
      addi $t1, $t1, 4
                         # 117
      addi $t2, $t2, 8
                         # I18
           loop
                          # I19
      j
           $v0, 10
end:
      li
                      # system call code for exit
      syscall
```

# Q15. (continue...)

code outside of I11 to I15.

| (a) | Write out the contents of array B after the execution of the MIPS code.                                                                                                                                                                       | [2] |
|-----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| (b) | Using the variable names ( <i>size</i> , <i>A</i> , <i>B</i> ) shown in the variable mappings above, write an equivalent C code corresponding to instructions I1 to I19 in the above MIPS code. You may use additional variable(s) if needed. |     |
|     | Do not do a line-by-line direct translation of the MIPS code. You do not need to declare the variables in your C code.                                                                                                                        | [4] |
| (c) | Write the instruction encoding of instruction I5 (slt $$t4$ , $$t0$ , $$t3$ ) in hexadecimal. The value in shamt for slt instructions is zero.                                                                                                | [2] |
| (d) | Assuming that I19 (j loop) is stored at address $0x0040\ 0084$ , (i) calculate the address of I5 (slt \$t4, \$t0, \$t3) and (ii) write the instruction encoding of I19 (j loop) in hexadecimal.                                               | [2] |
| (e) | Change the code from I11 to I15 to make the code more efficient. Make the changes on the Answer Sheets. Except for moving the labels if necessary, you are not to change the                                                                  |     |

[3]

#### Q16. Pipelining [13 marks]

Refer to the following MIPS code which is the same as the one in question 15. Here, we look only at instructions I1 to I19. Pay attention to the assumptions (underlined) given below.

```
$t0, $0, $0
                            # I1; i = 0
      add
                            # 12; $t1 = &A[0]
      addi $t1, $a1, 0
      addi $t2, $a2, 0
                            # I3; $t2 = &B[0]
            $t3, $a0, 2
      sll
                            # I4
loop: slt
            $t4, $t0, $t3
                            # I5
            $t4, $0, end
                            # 16
      beq
            $s1, 0($t1)
                              I7
      lw
      lw
            $s2, 0($t2)
                            # I8
      slt
            $t4, $s1, $s2
                            # 19
                            # I10
            $t4, $0, skip
      beq
            $t9, $s1, $0
                            # I11
      add
      add
            $s1, $s2, $0
                            # I12
            $s2, $t9, $0
      add
                            # I13
            $s1, 0($t1)
                            # I14
skip: sw
            $s2, 0($t2)
      sw
                            # I15
      addi $t0, $t0, 4
                            # I16
      addi $t1, $t1, 4
                            # I17
      addi $t2, $t2, 8
                            # I18
                            # I19
           loop
      j
end:
```

Assuming a 5-stage MIPS pipeline, and <u>all elements in array A are smaller than all elements in array B</u>, answer the parts below. You need to count until the last stage of instruction I19.

(a) How many cycles does this code segment take to complete its execution in the first iteration (I1 to I19) in an ideal pipeline, that is, one with no delays? [2]

For parts (b) to (d) below, given the assumption for each part, how many <u>additional cycles</u> does this code segment (I1 to I19) take to complete its execution in the first iteration as compared to an ideal pipeline computed in (a)? Note that the jump instruction (j) computes the target address to jump to in its ID stage (stage 2). No delayed branching is used.

Write the total number of <u>additional</u> delay cycles for each of the parts (b) to (d). For example, if part (a) takes 10 cycles and part (b) takes 30 cycles, then you should write +20 for part (b).

- (b) Assuming <u>without forwarding and branch decision is made at MEM stage (stage 4)</u>.

  No branch prediction is made.

  [3]
- (c) Assuming <u>with forwarding and branch decision is made at MEM stage (stage 4)</u>.

  No branch prediction is made.

  [3]
- (d) Assuming <u>with forwarding and branch decision is made at ID stage (stage 2)</u>.

  Branch is predicted not taken. [3]
- (e) Assuming the setting in part (b) above (without forwarding and branch decision at MEM stage), without affecting the correctness of the code, is it possible to move <u>one</u> instruction to somewhere else to reduce the number of delay cycles? If so, indicate which instruction to move, where to move it to, and how many delay cycles are reduced by moving it. If it is not possible, explain. [2]

#### Q17. Cache [16 marks]

Refer to the following MIPS code which is the same as the one in question 15. Here, we look only at instructions I1 to I19. The data segment in the MIPS code in question 15 no longer applies here as the arrays contain a lot more elements in this question.

```
$t0, $0, $0
                           # I1; i = 0
      add
      addi $t1, $a1, 0
                           # I2; $t1 = &A[0]
      addi $t2, $a2, 0
                           # 13; $t2 = &B[0]
           $t3, $a0, 2
                           # I4
           $t4, $t0, $t3
                           # I5
loop: slt
           $t4, $0, end
                           # 16
      beq
           $s1, 0($t1)
                           # I7
      lw
           $s2, 0($t2)
                           # I8
      lw
           $t4, $s1, $s2
      slt
                           # 19
           $t4, $0, skip
                           # I10
      beq
      add
           $t9, $s1, $0
                           # I11
           $s1, $s2, $0
                           # I12
      add
      add
           $s2, $t9, $0
                           # I13
           $s1, 0($t1)
                           # I14
skip: sw
           $s2, 0($t2)
                           # I15
      sw
      addi $t0, $t0, 4
                           # I16
      addi $t1, $t1, 4
                           # I17
      addi $t2, $t2, 8
                           # I18
           loop
                           # I19
end:
```

For parts (a) and (b): You are given a **2-way set-associative instruction cache** with 16 words in total. The replacement policy is **LRU** (least recently used). You may assume the following:

- There are **100 elements** in array A and twice as many elements in array B.
- All elements in array A are smaller than all elements in array B.
- Instruction I1 is stored at address 0x4488 FFFC.
- Consider only instructions I1 to I19 in the execution of the code.
- (a) Assume that each block contains 2 words.
  - (i) How many bits are there in the set index field? In the byte offset field? [2]
  - (ii) How many misses in total are there in the execution of the code? [2]
- (b) Assume that each block contains 4 words.

How many misses in total are there in the execution of the code? [2]

#### Q17. Cache (continue...)

For parts (c) to (e): You are given a **direct-mapped data cache** with 1024 words in total and each block contains 16 words. Recall that array *B* contains twice as many elements as array *A*. You may assume the following:

- Array A starts at address 0xFFFF 0000.
- Array B follows immediately after array A in the memory. That is, if the last element of array A is at address x, then the first element of array B is at address (x + 4).
- Only **lw** instructions are considered for the calculation of hits and misses; **sw** instructions are to be excluded from the calculation.
- (c) How many bits are there in the index field? In the byte offset field? [2]
- (d) Assuming that array A contains **512** elements, how many data access hits in total are in the data cache in the execution of the code (i) for array A and (ii) for array B? [4]
- (e) Assuming that array A contains **1024** elements, how many data access hits in total are in the data cache in the execution of the code (i) for array A and (ii) for array B? [4]

=== END OF PAPER ===

#### MIPS Reference Data CORE INSTRUCTION SET OPCODE / FUNCT NAME, MNEMONIC OPERATION (in Verilog) (Hex) (1) 0/20<sub>bex</sub> Add R R[rd] = R[rs] + R[rt]add 8<sub>bex</sub> Add Immediate addi 1 R[rt] = R[rs] + SignExtImm (1,2)Add Imm. Unsigned add1u R[rt] = R[rs] + SignExtImm 9<sub>hex</sub> (2) Add Unsigned 0/21hex R[rd] = R[rs] + R[rt]0/24<sub>bex</sub> And and R[rd] = R[rs] & R[rt]And Immediate R[rt] = R[rs] & ZeroExtImm andi (3) Chex if(R[rs]=R[rt])Branch On Equal PC=PC+4+BranchAddr if(RIrsl!-RIrtl) Branch On Not Equal bno PC=PC+4+BranchAddr (4) 2<sub>bex</sub> Jump J PC=JumpAddr (5) 1 Jump And Link R[31]=PC+8;PC=JumpAddr (5) 3<sub>hex</sub> jal 1 0/08<sub>hex</sub> Jump Register jr R PC=Rirs1 R[rt] {24'b0,M[R[rs] Load Byte Unsigned 1bu 24<sub>hex</sub> +SignExtImm](7:0)} (2) Load Halfword $R[rt]=\{16^{\circ}b0,M[R[rs]]$ Thu +SignExtImm](15:0)} Unsigned (2)Load Linked 11 R[rt] = M[R[rs] + SignExtImm]30<sub>hex</sub> Load Upper Imm. $R[rt] = \{imm, 16'b0\}$ fhex Iut Load Word 23<sub>hex</sub> R[rt] = M[R[rs] + SignExtImm]0 / 27<sub>hex</sub> Nor nor $R[rd] = \sim (R[rs] | R[rt])$ 0 / 25<sub>hex</sub> R[rd] = R[rs] | R[rt]10 dhex Or Immediate ori R[rt] = R[rs] | ZeroExtImm0 / 2a<sub>hex</sub> R[rd] = (R[rs] < R[rt]) ? 1 : 0Set Less Than sit R[rt] = (R[rs] < SignExtImm)? 1:0(2)Set Less Than Imm. siti a<sub>hex</sub> Set Less Than Imm. R[rt] = (R[rs] < SignExtImm)? 1:0 sitiu bhex Unsigned (2,6)Set Less Than Unsig. situ (6) 0/2bhex $R ext{ } R[rd] = (R[rs] < R[rt]) ? 1 : 0$ Shift Left Logical 811 R[rd] = R[rt] << shamt0/00<sub>hex</sub> 0 / 02<sub>hex</sub> Shift Right Logical sr1 R R[rd] = R[rt] >> shamtM[R[rs]+SignExtImm](7:0) =Store Byte sh 28<sub>hex</sub> R[rt](7:0) (2) M[R[rs]+SignExtImm] = R[rt];38<sub>bex</sub> Store Conditional sc R[rt] = (atomic)?1:0(2.7)M[R[rs]+SignExtImm](15:0) =Store Halfword 29<sub>hex</sub> sh (2) R[n](15:0) Store Word M[R[rs]+SignExtImm] = R[rt](2) 2b<sub>bex</sub> (1) 0/22<sub>hex</sub> R[rd] = R[rs] - R[rt]Subtract sub R Subtract Unsigned $R ext{ } R[rd] = R[rs] - R[rt]$ 0/23<sub>bex</sub> (1) May cause overflow exception (2) SignExtImm = { 16{immediate[15]}, immediate } (3) ZeroExtImm = { 16{1b'0}, immediate } (4) BranchAddr = { 14{immediate, 2'b0 } (5) JumpAddr = { PC+4[31:28], address, 2'b0 } (6) Operands considered unsigned numbers (vs. 2's comp.) (7) Atomic test&set pair; R[rt] = 1 if pair atomic, 0 if not atomic BASIC INSTRUCTION FORMATS opcode rs rd shamt funct opcode rt immediate MIPS opcode address

| ARITHMETIC CO         | RE INS  | TRU   | CTION SET ②                                                        | OPCODE        |
|-----------------------|---------|-------|--------------------------------------------------------------------|---------------|
|                       |         |       | 0.50                                                               | / FMT/F       |
|                       |         | FOR-  |                                                                    | / FUNCI       |
| NAME, MNEMO           |         | MAT   |                                                                    | (Hex)         |
| Branch On FP True     |         | FI    | if(FPcond)PC=PC+4+BranchAddr (4                                    | THE PERSONNEL |
| Branch On FP False    | bclf    | FI    | if(!FPcond)PC=PC+4+BranchAddr(4                                    |               |
| Divide                | div     | R     | Lo R[rs]/R[rt]; Hi R[rs]%R[rt]                                     | 0/-/-/1       |
| Divide Unsigned       | divu    | R     | Lo=R[rs]/R[rt]; Hi=R[rs]%R[rt] (6                                  | ) 0//-/1      |
| FP Add Single         | add.s   | FR    | F[fd] = F[fs] + F[ft]                                              | 11/10//       |
| FP Add<br>Double      | add.d   | FR    | ${F[fd],F[fd+1]} = {F[fs],F[fs+1]} + {F[n],F[n+1]}$                | 11/11//       |
| FP Compare Single     | cxs*    | FR    | FPcond = (F[fs] op F[ft]) ? 1:0                                    | 11/10//       |
| FP Compare<br>Double  | c.x.d*  | FR    | FPcond = ({F[fs],F[fs+1]} op<br>{F[ft],F[ft+1]}) ? 1:0             | 11/11/-/      |
| * (x is eq, 1t, 0     | T 1e) ( | op is | , <, or < ) (y is 32, 3c, or 3e)                                   |               |
| FP Divide Single      | div.s   | FR    | F[fd] = F[fs] / F[ft]                                              | 11/10/-/      |
| FP Divide<br>Double   | div.d   | FR    | ${F[fd],F[fd+1]} = {F[fs],F[fs+1]} / {F[ft],F[ft+1]}$              | 11/11//       |
| FP Multiply Single    | mul.s   | FR    | F[fd] = F[fs] * F[ft]                                              | 11/10//       |
| FP Multiply<br>Double | mul.d   | FR    | ${F[fd],F[fd+1]} = {F[fs],F[fs+1]} * {F[ft],F[ft+1]}$              | 11/11//       |
| FP Subtract Single    | sub.s   | FR    | F[fd]=F[fs] - F[ft]                                                | 11/10//       |
| FP Subtract<br>Double | sub.d   | FR    | ${F[fd],F[fd+1]} = {F[fs],F[fs+1]} - {F[fl],F[fl+1]}$              | 11/11//       |
| Load FP Single        | 1wc1    | 1     | F[rt]=M[R[rs]+SignExtImm] (2                                       | 31/-/-/-      |
| Load FP<br>Double     | 1dc1    | 1     | F[rt]=M[R[rs]+SignExtImm]; (2<br>F[rt+1]=M[R[rs]+SignExtImm+4]     | 35///-        |
| Move From Hi          | mrhi    | R     | R[rd] = Hi                                                         | 0 //-/10      |
| Move From Lo          | mrio    | R     | R[rd] = Lo                                                         | 0/-/-/1       |
| Move From Control     | mfc0    | R     | R[rd] = CR[rs]                                                     | 10 /0//       |
| Multiply              | mult    | R     | ${Hi,Lo} = R[rs] * R[rt]$                                          | 0///13        |
| Multiply Unsigned     | multu   | R     |                                                                    | 0///19        |
| Shift Right Arith.    | sra     | R     | R[rd] = R[rt] >>> shamt                                            | 0///          |
| Store FP Single       | swc1    | 1     | M[R[rs]+SignExtImm] = F[rt] (2                                     | 39/-/-/-      |
| Store FP<br>Double    | sdc1    | 1     | M[R[rs]+SignExtImm] = F[rt]; (2<br>M[R[rs]+SignExtImm+4] = F[rt+1] |               |

#### FLOATING-POINT INSTRUCTION FORMATS

| FR | opcode | fmt     | ft   | fs    | fd        | funct |
|----|--------|---------|------|-------|-----------|-------|
|    | 31 26  | 25 21.2 | 0 16 | 15 11 | 10 6      | 5 0   |
| FI | opcode | fmt     | n    |       | immediate |       |
|    | 31 26  | 25 21 2 | 0 16 | 15    |           | 0     |

#### PSEUDOINSTRUCTION SET

| NAME                         | MNEMONIC | OPERATION                        |
|------------------------------|----------|----------------------------------|
| Branch Less Than             | blt      | if(R[rs] < R[rt]) PC = Label     |
| Branch Greater Than          | bgt      | if(R[rs]>R[rt]) PC = Label       |
| Branch Less Than or Equal    | ble      | $if(R[rs] \le R[rt]) PC = Label$ |
| Branch Greater Than or Equal | bge      | if(R[rs] >= R[rt]) PC = Label    |
| Load Immediate               | 11       | R[rd] = immediate                |
| Move                         | move     | R[rd] = R[rs]                    |
|                              |          |                                  |

#### REGISTER NAME, NUMBER, USE, CALL CONVENTION

| NAME      | NUMBER | USE                                                      | PRESERVED ACROSS<br>A CALL? |
|-----------|--------|----------------------------------------------------------|-----------------------------|
| \$zero    | 0      | The Constant Value 0                                     | N.A.                        |
| Sat       | - 1    | Assembler Temporary                                      | No                          |
| \$v0-\$v1 | 2-3    | Values for Function Results<br>and Expression Evaluation | No                          |
| \$a0-\$a3 | 4-7    | Arguments                                                | No                          |
| \$10-\$17 | 8-15   | Temporaries                                              | No                          |
| \$s0-\$s7 | 16-23  | Saved Temporaries                                        | Yes                         |
| \$18-\$19 | 24-25  | Temporaries                                              | No                          |
| \$k0-\$k1 | 26-27  | Reserved for OS Kernel                                   | No                          |
| \$gp      | 28     | Global Pointer                                           | Yes                         |
| Ssp       | 29     | Stack Pointer                                            | Yes                         |
| \$fp      | 30     | Frame Pointer                                            | Yes                         |
| \$ra      | 31     | Return Address                                           | No                          |

Copyright 2009 by Elsevier, Inc., All rights reserved. From Patterson and Hennessy, Computer Organization and Design, 4th ed.

1

# **Part C: Answers and Workings**

## Q13. Sequential Circuit (12 marks)

(a)



| Pre | sent s | tate | N       | ext state | e              | Flip-flop inputs |      |      |      |        |              |
|-----|--------|------|---------|-----------|----------------|------------------|------|------|------|--------|--------------|
| Α   | В      | С    | $A^{+}$ | B⁺        | C <sup>+</sup> | JA=C'            | KA=C | JB=A | KB=1 | JC=A·B | <i>KC</i> =0 |
| 0   | 0      | 0    | 1       | 0         | 0              | 1                | 0    | 0    | 1    | 0      | 0            |
| 0   | 0      | 1    | 0       | 0         | 1              | 0                | 1    | 0    | 1    | 0      | 0            |
| 0   | 1      | 0    | 1       | 0         | 0              | 1                | 0    | 0    | 1    | 0      | 0            |
| 0   | 1      | 1    | 0       | 0         | 1              | 0                | 1    | 0    | 1    | 0      | 0            |
| 1   | 0      | 0    | 1       | 1         | 0              | 1                | 0    | 1    | 1    | 0      | 0            |
| 1   | 0      | 1    | 0       | 1         | 1              | 0                | 1    | 1    | 1    | 0      | 0            |
| 1   | 1      | 0    | 1       | 0         | 1              | 1                | 0    | 1    | 1    | 1      | 0            |
| 1   | 1      | 1    | 0       | 0         | 1              | 0                | 1    | 1    | 1    | 1      | 0            |

(b) State 0 transits to state 4 and state 7 transits to state 1, therefore the circuit is self-correcting.

(c) 
$$TA = A \odot C$$
 (or A XNOR C);  $TB = A + B$ ;  $TC = A \cdot B$ 

| Pre | esent st | ate | Next state |         |                | Flip-flop inputs |    |    |
|-----|----------|-----|------------|---------|----------------|------------------|----|----|
| Α   | В        | С   | $A^{+}$    | $B^{+}$ | C <sup>+</sup> | TA               | TB | TC |
| 0   | 0        | 0   | Χ          | Χ       | Х              | Χ                | Х  | Χ  |
| 0   | 0        | 1   | 0          | 0       | 1              | 0                | 0  | 0  |
| 0   | 1        | 0   | 1          | 0       | 0              | 1                | 1  | 0  |
| 0   | 1        | 1   | 0          | 0       | 1              | 0                | 1  | 0  |
| 1   | 0        | 0   | 1          | 1       | 0              | 0                | 1  | 0  |
| 1   | 0        | 1   | 0          | 1       | 1              | 1                | 1  | 0  |
| 1   | 1        | 0   | 1          | 0       | 1              | 0                | 1  | 1  |
| 1   | 1        | 1   | Χ          | Χ       | Χ              | Χ                | Х  | Χ  |

#### Q14. Combinational Circuits (16 marks)

(a)  $F = \Sigma m(0, 5, 10, 15)$  (4 marks)

| Α | В | С | D | 10 | 11 | 12 | 13 | F |
|---|---|---|---|----|----|----|----|---|
| 0 | 0 | 0 | 0 | 1  | 0  | 0  | 0  | 1 |
| 0 | 0 | 0 | 1 | 1  | 0  | 0  | 0  | 0 |
| 0 | 0 | 1 | 0 | 1  | 0  | 0  | 0  | 0 |
| 0 | 0 | 1 | 1 | 1  | 0  | 0  | 0  | 0 |
| 0 | 1 | 0 | 0 | 0  | 1  | 0  | 0  | 0 |
| 0 | 1 | 0 | 1 | 0  | 1  | 0  | 0  | 1 |
| 0 | 1 | 1 | 0 | 0  | 1  | 0  | 0  | 0 |
| 0 | 1 | 1 | 1 | 0  | 1  | 0  | 0  | 0 |
| 1 | 0 | 0 | 0 | 0  | 0  | 1  | 0  | 0 |
| 1 | 0 | 0 | 1 | 0  | 0  | 1  | 0  | 0 |
| 1 | 0 | 1 | 0 | 0  | 0  | 1  | 0  | 1 |
| 1 | 0 | 1 | 1 | 0  | 0  | 1  | 0  | 0 |
| 1 | 1 | 0 | 0 | 0  | 0  | 0  | 1  | 0 |
| 1 | 1 | 0 | 1 | 0  | 0  | 0  | 1  | 0 |
| 1 | 1 | 1 | 0 | 0  | 0  | 0  | 1  | 0 |
| 1 | 1 | 1 | 1 | 0  | 0  | 0  | 1  | 1 |

(b) The circuit is a 2-bit adder-cum-substractor. If Cin is set to 0, it performs X + Y. If Cin is 1, it performs X - Y.

Since F=1 when AB=CD, or AB-CD=0, the solution is shown below. (Other answers possible).



- (c) (8 marks)
  - (i) 5 PIs and 5 EPIs. (2 marks)
  - (ii)  $A \cdot B + C' \cdot D' + A \cdot C' + B \cdot D' + A \cdot D'$  (2 marks)
  - (iii)  $G = AB \ge DC$  (4 marks)





Other answers are acceptable, eg:  $ABC \ge DCB$ ,  $ABCD \ge DCBA$ , and others. Two magnitude comparators are given in case students cannot see the obvious answers and try out other answers.

#### Q15. MIPS (13 marks)



```
(b) (4 marks)
    // $a0 = size; $a1 = arrayA; $s2 = arrayB
    for (i = 0; i < size; i++) {
        if (A[i] < B[2*i]) {
            t = A[i];
            A[i] = B[2*i];
            B[2*i] = t;
        }
}</pre>
```

```
; 24: addi $t2, $a2, 0 # I3; $t2 =
[00400044] 20ca0000 addi $10, $6, 0
[00400048] 00045880 sll $11, $4, 2
                                               ; 25: s11 $t3, $a0, 2 # I4; $t3 = s
                     slt $12, $8, $11
                                               ; 26: slt $t4, $t0, $t3 # I5; i
[0040004c] 010b602a
                     beq $12, $0, 56 [end-0x00400050]
[00400050] 1180000e
                     lw $17, 0($9)
[00400054] 8d310000
                                               ; 28: lw $s1, 0($t1) # 17; $s1 = A[
                                               ; 29: 1w $s2, 0($t2) # 18; $s2 = B[
                     lw $18, 0($10)
[00400058] 8d520000
[0040005c] 0232602a
                                               ; 30: slt $t4, $s1, $s2 # 19
                     slt $12, $17, $18
[00400060] 11800004
                     beq $12, $0, 16 [skip-0x00400060]
[00400064] 0220c820
                     add $25, $17, $0
                                              ; 32: add $t9, $s1, $0 # Ill
[00400068] 02408820
                     add $17, $18, $0
                                               ; 33: add $s1, $s2, $0 # I12
[0040006c] 03209020
                     add $18, $25, $0
                                               ; 34: add $s2, $t9, $0 # I13
                     sw $17, 0($9)
sw $18, 0($10)
[00400070] ad310000
                                               ; 35: sw $s1, 0($t1) # I14
[00400074] ad520000
                                               ; 36: sw $s2, 0($t2) # II5
[00400078] 21080004
                     addi $8, $8, 4
                                               ; 37: addi $t0, $t0, 4 # I16
[0040007c] 21290004
                     addi $9, $9, 4
                                               ; 38: addi $t1, $t1, 4 # I17
[00400080] 214a0008 addi $10, $10, 8
                                               ; 39: addi $t2, $t2, 8 # I18
                                              ; 40: j loop # 119
. 40: li Sun 10 # evetem call code
[00400084] 08100013 j 0x0040004c [loop]
Innannneel 34nonnna
                     ori $2 $0 10
```

# 

### (e) (3 marks)

```
$t9, $s1, $0
       add
                               # I11
            $s1, $s2, $0
       add
                              # I12
       add $s2, $t9, $0
                              # I13
             $s1, 0(<del>$t1</del>$t2)
skip: sw
                                  # I14
             $s2, 0(<del>$t2</del>$t1)
                                  # I15
       sw
skip: addi $t0, $t0, 4
                               # I16
```

Other answers possible.

#### Q16. Pipelining (13 marks)

- (a) (2 marks) 19 + 5 1 = 23 cycles. Delays are highlighted under the columns (b), (c), (d) for parts (b),(c),(d) below respectively.
- (b) (3 marks) +17 cycles.
- (c) (3 marks) +7 cycles.
- (d) (3 marks) +3 cycles.

```
(b)
                                         (c)
                                             (d)
            $t0, $0, $0
                            # I1
      add
      addi $t1, $a1, 0
                            # I2
      addi $t2, $a2, 0
                            # I3
            $t3, $a0, 2
      sll
                            # I4
loop: slt
            $t4, $t0, $t3 # I5
                                    +2
            $t4, $0, end
                                    +2
                                             +1
      beq
                            # 16
            $s1, 0($t1)
                            # I7
                                    +3
                                         +3
      lw
      lw
            $s2, 0($t2)
                            # I8
            $t4, $s1, $s2
                           # I9
                                    +2
                                             +1
      slt
                                         +1
            $t4, $0, skip
                           # I10
                                    +2
                                             +1
      beq
            $t9, $s1, $0
      add
                            # I11
                                    +3
                                         +3
            $s1, $s2, $0
      add
                            # I12
      add
            $s2, $t9, $0
                            # I13
                                    +1
skip: sw
            $s1, 0($t1)
                            # I14
                                    +1
            $s2, 0($t2)
                            # I15
                                    +1
      SW
      addi $t0, $t0, 4
                            # I16
      addi $t1, $t1, 4
                            # I17
      addi $t2, $t2, 8
                            # I18
            loop
                            # I19
      j
end:
                             Total:
                                         +7
                                             +3
                                    +17
```

#### (e) (2 marks)

Move I4 (sll \$t3, \$a0, 2) to above I2 to remove 2 cycles of delay at I5.

Many possible other answers are acceptable as long as they don't invalidate the code.

#### Q17. Cache (16 marks)

(a) (2 marks) (i) Set index: 2 bits; byte offset: 3 bits.

 $16/2/2 = 4 \text{ sets } \rightarrow 2 \text{ bits}$  for set index field; 8bytes (2 words) per block  $\rightarrow 3 \text{ bits}$  for byte offset. (2 marks) (ii) 10 misses.

Address of I1 = 0x4488 FFFC = 0b.... 1111 1100. So I1 is stored in set 3, word 1.

Execution: 1<sup>st</sup> iteration: 10 misses. No misses in subsequent iterations.

| Set 0 | 12 (M)<br>118 (M) | 13 (H)<br>119 (H) | I10 (M) | I11 (H) |
|-------|-------------------|-------------------|---------|---------|
| Set 1 | 14 (M)            | 15 (H)            | I12 (M) | I13 (H) |
| Set 2 | 16 (M)            | 17 (H)            | I14 (M) | I15 (H) |
|       |                   | I1 (M)            | 18 (M)  | I9 (H)  |
| Set 3 | I16 (M)           | I17 (H)           |         |         |

(b) (2 marks) Set index: 1 bit; byte offset: 4 bits.

16/4/2 = 2 sets → 1 bit for set index field; 16 bytes (4 words) per block → 4 bits for byte offset. 303 (6+99×3) misses.

Address of I1 = 0x4488 FFFC = 0b.... 1111 1100. So I1 is stored in set 1, word 3.

Execution: 1<sup>st</sup> iteration:

| Set 0 | I2 (M)<br>I18 (M) | 13 (H)<br>119 (H) | 14 (H)  | 15 (H) | I10 (M) | I11 (H) | I12 (H) | I13 (H) |
|-------|-------------------|-------------------|---------|--------|---------|---------|---------|---------|
| Set 1 | I14 (M)           | I15 (H)           | I16 (H) |        | 16 (M)  | 17 (H)  | 18 (H)  | 19 (H)  |

6 misses in the first iteration. In subsequent iterations, I5, I10 and I18 are misses.

- (c) (2 marks) Index: 6 bits; byte offset: 6 bits.
   1024/16 = 64 blocks → 6 bits for index field; 64 bytes (16 words) per block → 6 bits for byte offset.
- (d)  $(4 \text{ marks}) 473 (31 \times 15 + 8)$  for array A and 441  $(31 \times 14 + 7)$  for array B. (2 marks each)

Array A (address 0xFFFF0000) is loaded into the cache starting at block 0, array B (address 0xFFFF0800) is loaded into the cache starting at block 32.

For the first 31 block accesses for array A (first 62 block accesses for array B), array A has 15 hits for every 16 accesses, and array B has 14 hits for every 16 accesses.

For the last block access for array A, the hits and misses are as follows (cache thrashing occurs when B[1008:1022] are loaded into the same block as A[504:511]):

| Α | 496 (M)  | 497 (H)  | 498 (H)  | 499 (H)  | 500 (H)  | 501 (H)  | 502 (H)  | 503 (H)  |
|---|----------|----------|----------|----------|----------|----------|----------|----------|
| В | 992 (M)  | 994 (H)  | 996 (H)  | 998 (H)  | 1000 (H) | 1002 (H) | 1004 (H) | 1006 (H) |
|   |          |          |          |          |          |          |          |          |
| Α | 504 (H)  | 505 (M)  | 506 (M)  | 507 (M)  | 508 (M)  | 509 (M)  | 510 (M)  | 511 (M)  |
| В | 1008 (M) | 1010 (M) | 1012 (M) | 1014 (M) | 1016 (M) | 1018 (M) | 1020 (M) | 1022 (M) |

Common mistake: Not accounting for the cache thrashing at the end: 480 for array *A*, 448 for array *B*.

(e) (4 marks) **945** (**7+62**×**15+8**) for array *A* and **882** (**7+62**×**14+7**) for array *B*. (2 marks each)

Array A (address 0xFFFF0000) is loaded into the cache starting at block 0, array B (address 0xFFFF1000) is loaded into the cache starting at block 0 as well.

For the first block access for array A (first 2 block accesses for array B), the hits and misses are as follows (cache thrashing occurs when B[0:15] are loaded into the same block as A[0:7]):

| Α | 0 (M)  | 1 (M)  | 2 (M)  | 3 (M)  | 4 (M)  | 5 (M)  | 6 (M)  | 7 (M)  |
|---|--------|--------|--------|--------|--------|--------|--------|--------|
| В | 0 (M)  | 2 (M)  | 4 (M)  | 6 (M)  | 8 (M)  | 10 (M) | 12 (M) | 14 (M) |
|   |        |        |        |        |        |        |        |        |
| Α | 8 (M)  | 9 (H)  | 10 (H) | 11 (H) | 12 (H) | 13 (H) | 14 (H) | 15 (H) |
| В | 16 (M) | 18 (H) | 20 (H) | 22 (H) | 24 (H) | 26 (H) | 28 (H) | 30 (H) |

For the next 62 block accesses for array A (next 104 block accesses for array B), array A has 15 hits for every 16 accesses, and array B has 14 hits for every 16 accesses.

For the last block access for array A, the hits and misses are as follows (cache thrashing occurs when B[2032:2046] are loaded into the same block as A[1016:1023]):

| Α | 1008 (M) | 1009 (H) | 1010 (H) | 1011 (H) | 1012 (H) | 1013 (H) | 1014 (H) | 1015 (H) |
|---|----------|----------|----------|----------|----------|----------|----------|----------|
| В | 2016 (M) | 2018 (H) | 2020 (H) | 2022 (H) | 2024 (H) | 2026 (H) | 2028 (H) | 2030 (H) |
|   |          |          |          |          |          |          |          |          |
| A | 1016 (H) | 1017 (M) | 1018 (M) | 1019 (M) | 1020 (M) | 1021 (M) | 1022 (M) | 1023 (M) |
| В | 2032 (M) | 2034 (M) | 2036 (M) | 2038 (M) | 2040 (M) | 2042 (M) | 2044 (M) | 2046 (M) |

#### Common mistakes:

- Not accounting for the cache thrashing at the start: 953 ( $63 \times 15+8$ ) for array A, 889 ( $63 \times 14+7$ ) for array B.
- Not accounting for the cache thrashing at the end: 952 (7+63 $\times$ 15) for array *A*, 889 (7+63 $\times$ 14) for array *B*.
- Not accounting for the cache thrashing at the start and end: 960 (64 $\times$ 15) for array A, 896 (64 $\times$ 14) for array B.